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AN  ARTIFICIAL  INTELLIGENCE  APPROACH  TO  ANALOG  SYSTEMS 

DIAGNOSIS 


1.  INTRODUCTION 

This  report  describes  some  general  diagnostic  reasoning  techniques  that  exploit  recent  advances 
in  the  field  of  artificial  intelligence  (AI).  They  are  applicable  to  a  variety  of  human-engineered  sys¬ 
tems,  including  hydraulic,  mechanical  and  optical  ones,  but  the  primary  focus  has  been  on  electronic 
systems.  These  techniques  were  developed  over  a  period  of  several  years  of  research  in  this  area  at 
the  Naval  Research  Laboratory.  One  of  the  products  of  this  research  is  a  fully  implemented  diagnos¬ 
tic  reasoning  system  called  Fault  Isolation  System  (FIS)  that  embodies  these  techniques  and  is  in  use 
at  a  variety  of  government  and  industry  laboratories. 

The  FIS  system  is  also  described  in  considerable  detail  in  other  publications  [1,2].  The  intent  of 
this  report  is  to  present  the  methods  in  a  way  in  which  they  can  be  adapted  and  used  by  others. 

1.1  The  Need  for  Artificial-Intelligence-Based  Diagnostic  Systems 

The  maintenance  of  electronic  equipment  has  drawn  increasing  attention  during  the  past  decade 
as  a  potential  artificial  intelligence  application,  particularly  in  the  military.  This  has  been  motivated 
by  the  increasing  complexity  of  military  electronic  systems  and  the  resulting  high-maintenance  costs 
as  well  as  the  scarcity  of  highly  trained  technicians. 

The  Navy  has  been  particularly  interested  in  and  supportive  of  the  development  of  fault-isolation 
expert  systems  that  can  improve  the  quality  of  their  maintenance  and  troubleshooting  activities.  As  an 
example.  Navy  technicians  on  aircraft  carriers  may  be  responsible  for  troubleshooting  several  hundred 
different  (sub)systems  for  which  they  have  had  varying  amounts  of  training;  (frequently  little  or 
none).  To  compensate,  the  Navy  has  invested  and  continues  to  invest  heavily  in  automatic  test  equip¬ 
ment  (ATE)  to  aid  or  replace  these  technicians.  The  quality  of  the  “test  programs”  that  drive  these 
ATE  stations  varies  dramatically  in  spite  of  a  uniform  high  cost  to  acquire  them. 

For  the  past  few  years,  we  have  had  the  opportunity  to  explore  the  u>  >  rtificial  intelligence 
(AI)  and  expert  system  technology  in  this  setting.  We  feel  that  we  have  evolved  a  set  of 
methods  that  directly  addresses  the  issues  discussed  above  and  we  have  show  ne  effectiveness  of 
these  methods  by  implementing  a  diagnostic  system  (FIS)  that  embodies  these  techniques. 

1.2  A  Statement  of  the  Diagnosis  Problem 

Since  there  are  many  different  kinds  of  diagnosis  problems,  it  is  important  to  understand  more 
prccise'v  the  diagnostic  reasoning  problem  addressed  in  this  report.  We  will  do  so  by  describing 
what  questions  must  be  answered  by  the  diagnostic  reasoning  system,  and  what  kinds  of  knowledge 
about  the  unit  under  test  (UUT)  are  assumed  to  be  available  to  support  the  diagnostic  process. 

Manuscript  approved  April  27,  1989. 
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The  questions  to  be  answered  by  the  diagnostic  system  are  intuitively  quite  simple.  We  want  the 
system  to  be  able  to  recommend  at  any  point  the  next  best  test  to  make  on  the  UUT  from  a  set  of 
predefined  available  tests,  and  we  want  it  to  estimate  the  probability  that  a  given  replaceable  com¬ 
ponent  (or  Boolean  combination  of  components)  is  faulty,  given  some  test  results.  These  are  usually 
done  cyclically,  as  shown  in  Fig.  1. 


Fig.  1  —  Top  level  activities 
of  a  diagnosis  system 


To  decide  what  kind  of  knowledge  about  a  UUT  must  be  available  and  how  it  should  be 
represented  in  machine-readable  form  for  effective  diagnosis  is  a  more  difficult  problem.  In  the  field 
of  artificial  intelligence  this  is  known,  in  general  terms,  as  the  knowledge  representation  problem. 
Closely  related  to  this  is  the  knowledge  acquisition  problem :  Having  chosen  a  representation,  how 
easy  or  difficult  is  it  to  get  the  required  knowledge  about  a  particular  problem  into  this  format.  In 
many  AI  applications,  knowledge  acquisition  is  a  crucial  consideration  because  it  can  involve  a  high 
cost  in  terms  of  time  and  money.  This  is  also  true  of  fault  diagnosis.  To  create  a  computer  program 
that  can  troubleshoot  a  piece  of  equipment  with  useful  accuracy  and  efficiency,  a  considerable  amount 
of  effort  must  be  invested  in  knowledge  acquisition.  Therefore,  we  wish  to  represent  the  UUT  with  a 
minimal  amount  of  knowledge  that  still  allows  effective  diagnostic  reasoning. 
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Let  us  briefly  consider  some  different  approaches  to  diagnosis  in  light  of  the  relative  ease  or  dif¬ 
ficulty  of  the  knowledge  acquisition  required  to  support  them.  First,  consider  the  approach  of  directly 
writing  a  test  decision  tree.  This  corresponds  to  the  conventional  practice  of  writing  programs  to  con¬ 
trol  ATE  in  which  the  burden  of  diagnostic  reasoning  lies  primarily  on  the  human.  What  he  enters 
into  the  computer  is  essentially  a  decision  tree  that  will  control  the  ATE  autonomously  or  with 
minimal  human  interaction.  Although  converting  such  a  decision  tree  into  computer  code  is  usually 
not  very  labor-intensive,  considerable  time  and  effort  are  usually  involved  in  producing  the  decision 
tree  itself,  because  the  human  needs  to  think  through  a  large  number  of  diagnostic  situations.  This 
decision  tree  approach,  therefore,  has  high  knowledge  acquisition  cost.  It  also  has  the  disadvantages 
of  limited  fault  coverage,  inflexibility,  lack  of  explanation  capability,  and  susceptibility  to  human 
error  and  suboptimai  decisions. 

At  another  extreme  is  the  approach  of  entering  into  the  computer  a  detailed  circuit-level  model 
of  the  UUT  sufficient  to  simulate  its  behavior.  This  approach  has  the  possibility  of  low  knowledge 
acquisition  cost  since  CAD/CAM  data  could  be  used  to  provide  the  information  in  computer-readable 
form,  and  only  the  important  aspects  of  behavior  (performance  specifications)  would  need  to  be 
largely  human  specified.  The  difficulty  with  this  approach  lies  in  our  inability  to  use  such  a  low-level 
description  of  a  UUT  efficiently  for  diagnosis.  Even  simple  circuit- level  simulations  can  take  hours 
of  computer  time,  although  this  may  improve  in  the  future  with  advances  in  computing  hardware  and 
simulation  methods. 

Consequently,  a  primary  focus  of  our  research  as  been  to  develop  a  knowledge  representation 
scheme  that  is  easy  to  acquire  and  useful  (in  a  practical  sense)  for  developing  efficient  diagnostic  sys¬ 
tems.  We  have  achieved  that  goal  by  exploiting  from  the  AI  community  emerging  ideas  about 
knowledge  representation. 

1.3  Related  Work  in  AI 

Recently,  some  notable  work  has  been  directed  at  some  of  the  same  problems  we  address  here. 
For  example,  DeKleer  [3]  gives  a  clear  analysis  of  the  problems  of  computing  fault  hypothesis  proba¬ 
bilities,  test  result  probabilities,  and  the  use  of  entropy  for  test  recommendation.  However,  this 
approach  requires  a  strong  UUT  model;  one  must  be  able  to  predict  module  output  values  from 
module  input  values.  It  is  often  impractical  to  provide  such  a  model  for  analog  systems,  although 
digital  systems  lend  themselves  well  to  this  approach.  Also,  computational  efficiency  is  not 
addressed.  For  the  results  to  be  used  widely  in  applications,  one  needs  fast  probability  and  entropy 
algorithms  and  module  simulations.  Genesereth  [4]  and  Davis  [5]  have  done  considerable  work  in  the 
area  of  diagnosis  based  on  structural  and  functional  UUT  descriptions,  but  focus  has  been  on  digital 
systems.  Other  recent  work  [6]  based  on  Bayesian  belief  networks  [7],  provides  a  powerful 
mathematical  approach  to  treating  the  joint  statistics  of  component  faults  and  signal  abnormalities. 

1.4  Summary  of  Our  Approach 

When  looking  at  AI  technology,  it  is  tempting  to  conclude  that  one  could  significantly  improve 
this  sort  of  troubleshooting  activity  with  reasonable  straightforward  applications  of  current  expert  sys¬ 
tem  technology.  However,  several  aspects  of  the  problem  raise  significant  technical  issues.  First, 
with  several  hundred  different  systems  to  maintain,  it  is  infeasible  to  think  in  terms  of  independently 
developed  expert  systems  for  each  one.  Rather,  one  thinks  in  terms  of  a  more  general  fault  isolation 
shell  that  provides  a  common  knowledge  acquisition/representation  scheme  for  use  with  all  subsys¬ 
tems.  However,  even  with  this  level  of  generality,  there  are  still  several  hundred  knowledge  bases  to 
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be  built,  debugged,  and  maintained  in  a  context  in  which  there  can  be  considerable  overlap  and/or 
similarity  in  the  content  of  many  of  the  knowledge  bases.  These  observation  strongly  suggest  the 
development  of  a  sophisticated  knowledge  acquisition  system  that  can  be  used  to  facilitate  the  con¬ 
struction  of  a  new  knowledge  base  for  a  specific  system  in  a  variety  of  ways  including  reusing  and/or 
adapting  existing  knowledge  modules. 

Compounding  the  problem  of  applying  current  expert-system  technology  is  the  fact  that,  for 
many  of  the  subsystems  being  maintained,  little  human  expertise  exists  in  the  traditional  sense  of  find¬ 
ing  people  who  are  good  at  fixing  particular  subsystems  and  capturing  their  knowledge  in  sets  of  asso¬ 
ciative  rules.  This  is  particularly  true  for  newly  developed  systems  for  which  empirical  experience  is 
absent.  Rather,  technicians  depend  heavily  on  the  structural  and  functional  descriptions  contained  in 
the  technical  manuals  of  the  many  subsystems  they  attempt  to  maintain.  This  suggests  that  simple 
rule-based  architectures  are  not  likely  to  be  sufficient  for  the  task  at  hand,  and  that  a  model-based 
approach  may  be  appropriate. 

At  the  same  time,  it  is  clear  (as  discussed  earlier)  that  detailed  quantitative  models  are  in  general 
too  inefficient  for  effective  diagnosis.  As  a  consequent  e,  we  have  adopted  an  intermediate  approach 
of  providing  to  the  diagnostic  system  a  simplified  model  of  the  qualitative  behavior  of  the  replaceable 
modules  and  the  structure  (connectivity)  of  the  UUT  that  is  both  easy  to  acquire  and  can  be  used  effi¬ 
ciently  for  diagnosis.  We  refer  to  these  models  as  qualitative  causal  models. 

By  using  this  form  of  knowledge  representation,  we  have  been  able  to  develop  an  effective  diag¬ 
nostic  system  with  the  following  notable  features: 

(a)  the  ability  to  do  accurate  diagnosis  by  using  a  qualitative  behavior  model  of  a  complex 
analog/digital  UUT  without  simulation, 

(b)  the  possibility  of  efficient  UUT  knowledge  acquisition, 

(c)  an  efficient  probabilistic  reasoning  method  specialized  for  device  troubleshooting  based  on 
Bayesian  principles, 

(d)  a  natural  treatment  of  multiple  faults,  and 

(e)  an  efficient  method  for  computing  the  entropy  of  a  complex  system  for  use  in  best  test 
selection. 

The  core  set  of  techniques  that  provide  these  features  is  described  in  more  detail  in  the  sections 
that  follow.  The  techniques  range  from  very  general  ones,  such  as  the  algorithms  for  computing  the 
probability  and  the  entropy  of  an  arbitrary  Boolean  expression,  to  highly  domain-specific  ones,  such 
as  the  heuristic  methods  for  certifying  modules  after  passed  tests. 

2.  QUALITATIVE  CAUSAL  MODELS 

As  previously  mentioned,  a  central  element  in  our  approach  to  diagnosis  is  the  capture  of  the 
important  aspects  of  the  behavior  of  a  UUT  for  diagnostic  purposes  in  a  qualitative  causal  model.  A 
qualitative  causal  model  of  a  UUT  consists  of:  (a)  a  causal  description  of  the  set  of  replaceable 
modules  (determined  by  the  level  to  which  fault  isolation  is  to  occur),  (b)  a  description  of  the  connec¬ 
tivity  (structure)  of  the  replaceable  modules,  and  (c)  a  set  of  possible  tests  from  which  diagnostic 
sequences  can  be  constructed.  Also,  the  model  can  contain  (if  available)  estimates  of  a  priori  failure 
rates  as  well  as  the  costs  of  making  tests  and  replacing  modules. 
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The  diagnostic  power  comes  from  the  requirement  that  the  description  of  each  replaceable 
module  must  include  a  set  of  local  causal  rules  describing  its  behavior.  The  diagnostic  system  then 
uses  both  the  structural  (connectivity)  information  and  the  sets  of  local  causal  rules  to  determine  the 
set  of  possible  global  causes  (ambiguity  set)  of  any  failed  test.  This  information  is  used  in  conjunc¬ 
tion  with  a  priori  fault  probabilities  (which  are  assumed  to  be  uniform  if  not  readily  available)  by  an 
efficient  Bayesian  algorithm  to  estimate  posterior  probabilities  of  module  faults  and  test  outcomes. 
These  are  used  along  with  test  cost  data  and  module  replacement  costs  (both  of  which  are  assumed  to 
be  uniform  if  unavailable)  to  make  the  next  best  test  or  replacement  recommendations.  Two  stra¬ 
tegies  are  described  for  computing  the  next  best  test;  one  is  based  on  specialized  heuristics  and  one  is 
based  on  information  theoretic  entropy.  The  latter  uses  an  efficient  new  algorithm  for  computing  the 
Shannon  entropy  of  a  complex  system.  An  important  advantage  of  our  approach  to  diagnosis  is  that 
no  single-fault  assumptions  need  to  be  made. 

2.1  Local  Causal  Rules 

The  heart  of  the  UUT  description  is  a  network  of  local  qualitative  causal  rules  that  relate 
measurable  (at  least  in  principle)  parameters  among  various  terminals  of  each  module.  These  are 
intended  to  describe  the  behavior  of  each  module  in  its  context.  Thus  where  loading  and  other  effects 
of  Kirchoff  s  laws  affect  behavior,  these  effects  are  assumed  to  have  been  taken  into  account.  We  are 
not  restricted  to  unidirectional  modules  with  fixed  I/O  voltage  behavior.  There  are  two  types  of 
rules— through-rules  containing  information  about  the  correct  behavior  of  UUT  modules  and  module- 
rules  containing  information  about  fault  modes  of  modules,  as  discussed  below.  The  forms  of  the  two 
types  are  as  follows: 

Through-rule: 

Given  precondition,  parameter  1  abnormality  1  at  terminal  1  can  cause  parameter2  abnor- 

mality2  at  terminal  2 

Module-rule: 

Faulty  modulename  can  cause  parameter  abnormality  at  terminal. 

A  precondition  is  a  predicate  on  the  state  of  the  UUT  inputs  (or  input  history).  Preconditions  are 
optional  and  are  used  primarily  to  model  devices  such  as  multiplexers  and  devices  with  enable  inputs 
in  which  a  causal  path  is  present  or  absent  depending  on  some  signal  conditions.  Although  we  have 
experimented  with  elaborate  precondition  schemes  that  involve  estimating  probabilities  of  control  sig¬ 
nal  values,  we  have  found  it  quite  adequate  to  simply  include  in  the  description  of  each  test  the  states 
qualitative  value  such  as  high  or  low.  For  example,  a  through-rule  is  “Given  that  power-supply-3  is 
in  the  “on”  state,  dc  voltage  high  at  t3  can  cause  frequency  low  at  t4.”  This  might  describe  the  way 
a  voltage-controlled  oscillator  propagates  a  voltage  error,  given  that  its  power  supply  voltage  is  within 
specification.  In  the  case  of  a  module  with  two  inputs  that  both  affect  the  same  output  parameter, 
such  as  a  summing  junction,  a  problem  can  arise.  Suppose  two  of  the  rules  are  “inputl  dc  high  can 
cause  output  dc  high”  and  “input2  dc  high  can  cause  output  dc  high.”  Then  if  “output  dc  high"  is 

suspected,  both  inputs  are  to  be  suspected  high.  However,  one  might  be  very  high  and  the  other 

slightly  low,  and  the  latter  would  be  missed.  However,  if  such  a  case  occurs,  the  system  will  find  a 
fault  leading  to  the  high  input,  and  the  ensuing  repair  may  even  cure  the  slightly  low  input.  Other¬ 
wise,  a  second  diagnostic  pass  can  search  for  it.  Each  causal  rule  is  associated  with  the  module 

immediately  connected  to  the  terminals  appearing  in  the  rule. 
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Through-rules  represent  the  correct  behavior  of  a  module  in  context  in  the  form  of  qualitative 
relationships  among  quantities  at  its  terminals.  Module-rules  represent  limited  knowledge  about  how 
a  given  module  can  fail.  Note  that  even  if  we  had  no  such  fault  model  knowledge  available,  we  could 
allow  the  system  to  include  all  conceivable  module  rules  by  default  and  still  have  a  viable  diagnosis 
system.  That  is,  we  could  assume  that  any  abnormality  at  a  terminal  could  have  been  caused  by  the 
failure  of  any  directly  connected  module.  This  would  lead  to  somewhat  larger  ambiguity  sets  (suspect 
sets)  than  it  would  with  a  more  minimal  set  of  module-rules.  However,  most  of  the  diagnostic  power 
of  this  kind  of  model  comes  from  through-rules,  since  it  is  largely  by  chaining  them  that  we  deter¬ 
mine  what  portion  of  the  UUT  affects  a  given  measurement.  When  in  doubt  about  the  inclusion  of 
any  rule,  we  must  include  it,  else  we  risk  missing  some  faults. 

2.2  The  Need  for  Predefined  Tests 

It  is  necessary  to  predefine  a  set  of  tests  for  a  diagnostic  system  because  no  matter  how  com¬ 
plete  a  model  of  the  structure  and  input/output  behavior  of  a  UUT  is  used,  it  cannot  know  the 
intended  use  of  the  UUT.  For  example,  in  a  radar  receiver,  a  diagnostic  system  must  know  for  what 
frequencies  it  must  operate  correctly,  and  what  tested  parameters  must  be  correct  to  within  what  toler¬ 
ance. 


Our  notion  of  a  test  is  a  specified  terminal  to  be  used  as  a  test  point,  an  electrical  (or  other) 
parameter  to  be  measured  there,  and  a  set  of  abnormalities  such  as  (bad)  or  {hi,  lo)  with  associated 
numerical  ranges.  A  range  of  “ok”  is  also  required.  Note  that  the  triple  “terminal  parameter  abnor¬ 
mality”  is  of  the  same  form  as  the  right-hand  side  (effect)  of  a  causal  rule.  In  fact  each  such  triple 
occurring  in  a  test  must  occur  in  the  right-hand  side  of  at  least  one  rule.  Also,  the  states  of  all  rule 
preconditions,  if  any,  must  be  given  for  each  test,  as  discussed  in  Section  2.1.  Finally,  each  test 
should  have  a  prescribed  stimulus  setup  state.  This  is  simply  a  unique  symbol  for  each  distinct  state 
of  the  input  stimuli  of  the  UUT  used  in  the  tests.  It  is  useful  in  determining  the  relevance  of  a  passed 
test  to  a  suspect  module  (Section  3.4). 

2.3  Ambiguity  Sets 

Central  to  our  approach  is  the  notion  of  an  ambiguity  set.  One  ambiguity  set  is  associated  with 
each  abnormal  result  of  each  defined  test,  and  it  is  defined  to  be  the  set  of  all  modules  that  can  fail  so 
as  to  cause  that  failed  test  result.  If  one  can  obtain  such  sets,  then  one  can  determine  what  combina¬ 
tions  of  good  and  bad  modules  are  possible  after  more  than  one  test  has  failed.  In  Boolean  terms, 
each  ambiguity  set  is  a  disjunction  of  propositions  that  the  included  modules  are  faulty,  and  two  or 
more  ambiguity  sets  represent  a  conjunction  of  such  disjunctions.  For  example,  the  ambiguity  sets  {1, 
2|  and  {1,  4|  represent  the  fact  that  at  least  one  of  modules  one  and  two  is  faulty  and  that  at  least  one 
of  the  modules  one  and  four  is  faulty.  In  Boolean  terms,  (x  v  x2)  &  (*,  v  x4)  =  1,  where  jc,  is  the 
proposition  that  module  /  is  faulty.  Note  that  ambiguity  sets  alone  do  not  allow  the  representation  of 
an  arbitrary  Boolean  function.  For  example,  they  cannot  represent  the  exclusive  or  of  two  modules, 
but  we  believe  that  our  forms  cover  the  majority  of  cases  occurring  in  diagnostic  applications,  and  the 
separability  of  the  ambiguity  sets  is  convenient  for  undoing  a  test  already  made. 

In  diagnostic  applications,  we  find  it  useful  to  precompute  the  ambiguity  set  for  each  failed  out¬ 
come  of  each  test.  Also,  for  each  module  occurring  in  an  ambiguity  set  it  is  useful  to  precompute 
immediate  effects.  These  are  simply  the  signal  abnormalities  at  the  terminals  of  the  module  that  both 
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lead  causally  to  the  failed  test  result  and  which  occur  as  the  right-hand  side  of  a  rule  whose  left-hand 
side  is  the  module.  Immediate  effects  are  useful  as  a  simple  approximation  of  failure  modes  (see  Sec¬ 
tion  3.4). 

The  general  probabilistic  techniques  described  in  Sections  3  and  4  can  be  applied  no  matter  how 
the  ambiguity  sets  are  obtained,  but  if  the  causal  model  of  Section  2. 1  and  2.2  are  used,  then  they  can 
be  found  by  following  the  network  of  local  causal  rules  upstream  from  each  failed  test  result  until  all 
modules  causally  upstream  are  found.  If  ambiguity  sets  are  directly  produced  by  humans,  they  consti¬ 
tute  global  symptom/cause  association  rules. 

Figure  2  illustrates  the  diagnostic  power  of  our  simple  causal  rule  model.  Three  failed  tests  at 
the  same  test  point  are  shown,  all  with  different  ambiguity  sets.  The  ambiguity  sets  can  be  verified 
by  visually  chaining  from  effect  to  cause  through  the  given  list  of  causal  rules,  starting  at  a  test  result 
and  ending  at  a  set  of  suspect  modules.  Earlier  model-based  diagnostic  systems  |8,9]  are  not  as 
specific  because  they  represent  only  the  topology  of  the  UUT.  Such  systems  suspect  all  upstream 
modules  from  any  failed  test. 

3.  THE  TREATMENT  OF  FAULT  PROBABILITIES 

This  section  describes  an  analysis  of  the  joint  statistics  of  UUT  faults.  A  UUT  is  regarded  as 
having  2"  fault  states,  since  the  proposition  jc,  that  module  i  of  the  n  modules  is  faulty  can  be  true  or 
false.  The  probability  of  an  arbitrary  fault  hypothesis,  represented  as  a  Boolean  function  of  the  xh  is 
developed  in  stages.  First  we  consider  a  UUT  selected  from  the  entire  population  of  a  given  UUT 
type,  then  a  UUT  from  the  population  actually  undergoing  diagnosis,  and  finally  a  UUT  for  which 
various  combinations  of  passed  and  failed  tests  have  been  made. 

In  problem  domains  for  which  models  relating  structure  and  behavior  are  incomplete  and  uncer¬ 
tain,  such  as  medical  diagnosis,  it  is  often  impossible  to  do  probabilistic  reasoning  in  a  precise 
manner,  since  the  relevant  joint  statistics  are  unavailable.  Therefore,  approximate  methods  have  been 
found  to  be  useful,  such  as  those  in  MYCIN  [10]  and  PROSPECTOR  [11]  and  the  Dempster-Shafer 
[12]  formalism.  However,  in  the  domain  of  diagnosis  of  hurnan-engineered  systems  constructed  from 
discrete  modules,  it  is  possible  to  use  more  precise  probabilistic  methods  for  exploiting  the  known 
causal  and  statistical  relationships  of  this  domain.  For  example,  the  information  gleaned  from  a  failed 
test  can  often  be  described  as  an  ambiguity  set,  a  set  within  which  at  least  one  faulty  module  must  lie. 
Also  it  is  often  an  acceptable  approximation  to  assume  that  the  replaceable  modules  fail  independently 
of  one  another  with  their  own  a  priori  probabilities. 

3.1  A  Priori  Probability  Model 

We  assume  that  over  the  entire  population  of  a  UUT  type  the  individual  module  fault  proposi¬ 
tions  jc,  are  statistically  independent  of  each  other  with  a  priori  probabilities  a ,,  so  that  a, a,  is  the  a 
priori  probability  of  x,  &  Xj,  for  example.  We  are  thus  ignoring  the  case  of  one  component  failing 
first  and  stressing  a  second  component  so  that  it  fails  also.  This  will  cause  some  error  in  probability 
calculations,  which  will  affect  primarily  the  selection  of  tests  and  therefore  the  cost  (time,  money, 
etc.)  of  diagnosis  rather  than  its  accuracy.  Barring  this  coupling,  we  can  view  double  faults  as 
independent  events,  the  second  of  which  occurred  by  chance  in  the  time  between  the  occurrence  of 
the  first  and  the  time  of  testing. 
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A  diagnostic  system  does  not  see  me  population  described  above,  however.  We  assume  that  it 
sees  a  population  which  differs  in  that  a  certain  number  of  good  UUTs  are  omitted,  so  that  one  is 
more  likely  to  perform  diagnosis  on  UUTs  that  have  malfunctioned.  Specifically,  suppose  we  know 
empirically  that  P0  is  the  probability  that  a  UUT  about  to  undergo  diagnosis  is  faulty.  Now  let  us 
refer  to  the  two-module  UUT  example  of  Fig.  3,  in  which  area  represents  probability.  Each  of  the  2" 
rectangles  represents  the  probability  of  one  of  the  fault  states  (each  module  good  or  faulty)  of  the 
UUT.  The  crosshatched  area  of  the  upper  left  represents  the  good  UUTs  that  are  preferentially  omit¬ 
ted  in  the  population  undergoing  diagnosis,  so  that  the  remaining  area  in  the  “all  modules  good”  rec¬ 
tangle  divided  by  the  entire  remaining  area  is  1  ~  P 0,  the  probability  that  the  UUT  is  good  at  the 
start  of  diagnosis.  This  analysis  extends  to  arbitrary  dimensionality  (number  of  modules)  n,  but  it  is 
not  amenable  to  graphical  description. 


a2 

Fig.  3  —  Probability  analysis  before  any  test  fails; 
a  two— module  graphical  representation 


At  the  start  of  diagnosis,  the  probability  that  module  i  is  faulty  is 

P  0ai 

p<  = - ; — • 

« -  n  * 

7=1 

This  is  simply  the  probability  P0  that  there  is  a  fault  times  the  fraction  of  the  “faulty  area”  covered 
by  module  i. 

3.2  Calculating  Fault  Probabilities  After  Passed  Tests 

We  now  consider  how  to  update  the  overall  UUT  fault  probability  and  the  module  fault  proba¬ 
bilities  if  one  or  more  tests  have  been  made  and  passed,  and  none  have  faik  i.  We  make  the  simpli¬ 
fying  assumption  that  the  information  gleaned  from  passed  tests  can  be  represented  by  a  single 
number  c,  for  each  module  /.  This  is  called  the  certification  factor  and  is  initially  1.0.  It  approaches 
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zero  as  successive  passed  tests  are  made.  It  represents  the  factor  by  which  the  relative  a  priori  fault 
probability  a,  of  module  i  is  reduced  by  the  passing  of  tests.  The  manner  in  which  we  compute  c, 
needs  careful  consideration  and  is  discussed  in  Section  3.4.  Thus,  in  Fig.  3  the  a,  become  a(  =  c,a, 
after  at  least  one  test  passes. 

The  effect  of  this  on  the  current  probability  Puvt  ^at  the  UUT  is  faulty  is  given  by  the  follow¬ 
ing  equation.  It  simply  reflects  a  renormalization  of  the  probabilities  of  the  2"  rectangles  of  Fig.  3 
after  removing  the  area  between  the  “all  good”  rectangle  and  the  dotted  lines.  Then  Puut  *s  the 
sum  of  the  probabilities  of  all  areas  other  than  the  “all  good”  area. 

P  UUT  =  1  -  - 1 - 

Pod  -  n  ci°i ) 


a  -  p o)o  -  n5-) 


In  a  similar  vein,  each  module  fault  probability  Pt  becomes  the  sum  of  the  probabilities  of  the  2"  1 
rectangles  for  which  that  module  is  faulty,  after  the  renormalization  mentioned  above. 


Pi  = 


_ PpM _ 

(1  -  P0)d  ~  FI  +  pod  -  11  cj°j) 

j  J 


3.3  Calculating  Fault  Probabilities  After  Failed  Tests 


For  clarity,  we  first  describe  the  processing  of  failed  test  results  only.  Then  we  treat  the  case  of 
some  failed  and  some  passed  tests.  As  described  in  Section  2.3,  each  failed  test  gives  rise  to  an 
ambiguity  set  containing  all  modules  that  could  have  caused  the  abnormal  measurement.  The  set  of 
these  ambiguity  sets  corresponds  to  a  Boolean  conjunctive  normal  form  expression  in  the  module  fault 
propositions  x at  least  one  member  of  the  first  ambiguity  set  is  faulty,  and  at  least  one  member  of 
the  second  ambiguity  set  is  faulty,  etc.  Let  T  denote  this  expression  and  H  denote  an  arbitrary 
Boolean  function  of  the  x,  and  their  complements,  describing  a  fault  hypothesis  whose  current  proba¬ 
bility  is  desired.  A  fault  hypothesis  is  defined  here  as  a_Boolean  combination  of  assertions  that  indivi¬ 
dual  modules  are  good  or  faulty.  For  example,  x\  &  x2  &  x3  is  the  hypothesis  that  modules  1  and  3 
are  bad  and  module  2  is  good,  without  regard  for  the  other  modules. 


From  Bayes'  Rule  the  current  probability  of  H  is  given  by 


P(H  |  T) 


PjH&T) 
P(T)  ' 


(1) 


where  P(B)  is  defined  for  an  arbif-ry  Bool^n  function  B  as  the  a  priori  probability  of  the  proposi¬ 
tion  B ,  i.e.,  the  a  priori  probable  that  the  fault  state  of  the  UUT  is  consistent  with  B.  P(B)  is 
expressible  as 


P(B)  =  £  A;, 

B  --I 
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where 


Ai  =  UfMk).  “nd 

k 

f(ak)  =  ak,  if  the  bit  of  the  binary  representation  of  i  =  1 . 

s  ak,  if  the  kth  bit  of  the  binary  representation  of  /  =  0, 

where  ak  denotes  1  -  a *.  The  complexity  of  computing  P(B )  with  the  explicit  summation  is  0(2"). 
Therefore  we  introduce  a  more  efficient  algorithm  for  this  computation. 

Our  algorithm  computes  P(B)  for  an  arbitrary  Boolean  function  of  the  literals  x,-,  although  B 
must  be  given  in  conjunctive  normal  form.  The  strategy  is  to  manipulate  B  so  that  all  terms  of  con¬ 
junctions  are  statistically  independent  (involve  no  common  x,)  and  all  terms  of  disjunctions  are  non¬ 
overlapping  (mutually  exclusive).  We  then  can  replace  the  x,  with  the  corresponding  a  priori  com¬ 
ponent  failure  probabilities  a,  and  the  three  Boolean  operations  with  arithmetic  operations  using  the 
following  three  laws  from  probability  theory: 

A.  If  P(X)  =  p  then  P(X)  =  1  -  p. 

B.  If  P(X)  =  p  and  P(Y )  =  q,  then  P(XvY)  =  p  +  q  iff  X& Y  =  0. 

C.  If  P(X)  =  p,  P(Y)  =  q,  and  X  and  Y  are  statistically  independent,  then 

P(X&Y)  =  p  x  q. 

We  perform  the  Boolean  manipulation  with  the  following  algorithm.  We  assume  that  B  is 
presented  to  the  algorithm  in  conjunctive  normal  form.  Complexity  will  be  discussed  in  terms  of  the 
number  a  of  conjuncts  at  the  top  level  of  B  and  the  number  n  of  literals  occurring.  Note  that  in  the 
diagnosis  application  a  is  the  number  of  ambiguity  sets  (one  per  failed  test)  and  n  is  the  number  of 
modules.  An  example  follows  the  algorithm  for  clarity. 

Boolean  Manipulation  Algorithm: 

Step  1  —  Perform  Boolean  absorption  to  remove  all  redundant  conjuncts. 

Step  2  —  Apply  DeMorgan’s  Law  to  B  so  that  it  is  in  complemented  disjunctive  normal  form. 

Step  3  —  Order  the  disjuncts  by  length  (number  of  conjuncts  x,  or  x,),  longest  leftmost. 

Step  4  —  Correct  the  disjuncts  for  overlap  with  others,  from  left  to  right,  by  ANDing  certain  terms 
onto  each  (to  be  elaborated  later).  The  entire  conjunctive  expression  thus  appended  to  each  disjunct 
will  be  called  a  correction  expression.  Terminate  the  algorithm  if  the  conjuncts  in  this  correction 
expression  are  all  literals. 

Step  5  —  To  each  correction  expression,  apply  this  algorithm  recursively,  starting  at  Step  1. 
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Steps  1  and  4  of  the  above  Boolean  manipulation  algorithm  need  additional  explanation.  In  Step 
1,  we  remove  all  redundant  conjuncts  as  follows.  For  each  conjunct,  compare  it  with  each  conjunct 
to  its  right.  Whenever  a  pair  is  encountered,  such  that  all  terms  of  one  are  present  in  the  other, 
delete  the  larger  conjunct.  For  example,  (XvY  )&  (XvYvZ )  =  XvY.  The  complexity  of  Step  1  is 
0(a2n },  assuming  that  the  literals  were  presorted  within  each  conjunct. 

Step  4  (the  overlap  correction)  of  the  Boolean  manipulation  algorithm  proceeds  by: 

Step  4a  —  Working  from  left  to  right,  take  the  next  disjunct  DN. 

Step  4b  —  For  each  disjunct  D  to  the  right  of  Dv,  perform  Step  4c. 

Step  4c  —  Note  whether  any  literal  x,  occurs  complemented  in  D  and  uncomplemented  in  DN. 
or  vice-versa.  If  so,  terminate  Step  4c;  this  pair  is  already  mutually  exclusive.  Otherwise  compute 
D~  =  what  remains  of  D  after  removing  from  it  those  conjuncts  (a,  or  xt)  present  in  DN,  and  append 
(AND)  the  complement  of  D~  (expressed  as  a  disjunction,  using  the  DeMorgan's  Law)  onto  DN. 
Step  4  has  complexity  0(a2n),  again  assuming  that  the  literals  were  presorted  within  each  conjunct. 

A  worst  case  complexity  of  the  above  algorithm  is  0(aan).  This  follows  from  the  fact  that  the 
complexity  of  the  first  recursion  is  (a2n)  (from  Steps  2  and  4)  and  each  additional  recursion  (Step  5) 
introduces  an  extra  factor  of  a  through  Step  4.  The  number  of  recursions  is  bounded  by  a,  since  Step 
4  produces  a  correction  expression  with  at  least  one  less  top  level  term  than  its  input  expression  at 
each  recursion.  There  may  exist  a  smaller  upper  bound  than  0(aan).  In  practice,  most  functions  B 
yield  few  (if  any)  recursions,  so  the  typical  complexity  is  closer  to  a2n.  Also,  in  the  diagnosis  appli¬ 
cation,  n  may  be  large  (~  100  modules)  but  a,  the  number  of  failed  tests,  is  typically  less  than  10. 

Example: 

The  following  is  an  example  of  the  application  of  the  above  algorithm.  In  keeping  with  our 
diagnostic  topic,  suppose  that  three  tests  have  failed  and  our  current  ambiguity  sets  are  (2,3,4), 
(4,5,6),  and  (6,7).  Let  us  compute  the  probability  of  the  hypothesis  x7  that  module  7  is  faulty.  The 
ambiguity  sets  overlap  interestingly  (at  least  two  faults  are  present)  and  make  a  good  illustration.  We 
define  the  shorthand  notation  P(B)  =  1  -  P(B)  and  x,  is  denoted  by  i.  Then  the  numerator  of  Eq. 
(1)  becomes  P(H&T)  =  P((2v3v4)<&  (4v5v6)<£  (6v7)<£  7],  Note  that  laws  A,  B,  and  C  cannot  be 
applied  at  this  point  to  compute  the  numerical  value  of  P(H&T).  In  particular,  the  four  conjuncts  are 
not  all  statistically  independent  of  one  another,  since  they  contain  some  common  literals.  Therefore, 
we  will  apply  the  Boolean  Manipulation  algorithm: 

H&T  =  (2v3v4)<&  (4v5v6)<&  (6v7)dt  7 

=  (2v3v4)<£ (4v5v6)<£ 7;  Step  1  (Boolean  absorption) 


=  (2<£3<£4)v(4<£5<£6)v7;  Step  2  (DeMorgan’s  law) 


=  (2<£3<£4(&(5v6)<&  7)v(4<&5<£6<£7)v7;  Step  4  (disjunction  overlap  removal) 
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(5v6)<&7  becomes  (5<£6)v7;  Steps  5  and  2  ,  recursively  (DeMorgan’s  law) 


=  (5<£6<&7)v7;  Step  4  (disjunction  overlap  removal) 


Finally,  H&T  =  (2<£3<£4<£(5<£6<£7)v7)v(4<£5<£6<47)v7. 

Now  we  can  apply  laws  A,  B,  and  C  to  all  negations,  conjunctions,  and  disjunctions  respectively: 


P(H&T)  =  1  -  />(2)P(3)P(4)P(5)P(6)P<7)  +  P(7))  -  P(4)P(5)P(6)P(7)  -  P(7). 

The  P(0  are  simply  the  a,.  The  numerator  of  Eq.  (1)  can  be  treated  similarly. 

The  method  above  can  be  used  to  compute  the  current  fault  probabilities  of  the  individual 
modules  after  each  test.  It  can  also  be  invoked  by  the  user  to  compute  the  probability  of  any  fault 
hypothesis,  even  one  including  some  fault  states  inconsistent  with  T. 

Now  we  treat  the  case  of  having  at  least  one  passed  test  in  addition  to  the  failed  test(s).  As 
described  in  Sections  3.2  and  3.4,  for  each  module  i  we  compute  a  certification  factor  c,,  which 
represents  the  evidence  from  passed  tests.  Thus  module  i  is  regarded  as  certified  by  the  amount  c,, 
and  the  a  priori  failure  rate  <?,  is  simply  replaced  with  c,at  in  the  probability  calculations  above. 

3.4  Passed  Tests  and  Certification  Strategies 

Now  we  focus  on  how  to  compute  the  certification  factors  c,  used  in  Sections  3.2  and  3.3. 
Since  we  assume  the  unavailability  of  failure  mode  statistics  here,  we  take  a  somewhat  heuristic 
approach.  We  motivate  our  discussion  of  certification  with  the  example  of  Fig.  4,  in  which  a  diag¬ 
nosis  system  currently  suspects  that  at  least  one  of  two  modules  ampl  and  amp2  contains  a  fault.  In 
particular,  an  RMS  Amplitude  test  fails  at  T3,  leading  us  to  suspect  a  gain  problem  in  the  upper  of 
two  amplifier  channels  in  both  modules.  Then  we  might  be  able  to  determine  which  module  is  likely 
to  be  faulty  by  making  a  test  dependent  on  ampl  and  not  on  amp2.  If  the  test  fails,  ampl  is  faulty;  if 
the  test  passes,  ampl  is  less  suspect  than  it  was,  and  amp2  is  more  suspect.  A  human  interpreting 
this  passed  test  would  think  about  what  failure  mode  of  ampl  tested  ok.  If  it  were  the  same  failure 
mode  that  made  ampl  initially  suspect,  ampl  would  be  exonerated  (reduction  of  c,  to  nearly  zero).  If 
the  passed  test  depended  on  a  completely  independent  failure  mode  of  ampl,  then  ampl  would  be 
only  slightly  certified  (small  reduction  of  c,). 

The  desiderata  for  the  c,  are  as  follows: 

(1)  Each  Cj  should  be  1.0  initially  and  never  less  than  zero. 

(2)  Each  passed  test  fed  by  module  i  should  reduce  c, . 

(3)  The  different  ways  a  module  can  fail  are  often  not  mutually  exclusive.  Thus  successive 
passed  tests  fed  by  module  /  should  have  progressively  less  effect  on  c,,  all  other  factors 
being  equal. 
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Consider  Three  Tests: 

(1)  Amplitude  OK  at  T2 

(2)  Frequency  OK  at  T2 

(3)  Amplitude  OK  at  T5 

Fig.  4  —  Difficulty  of  interpreting  passed  tests 


(4)  If  a  module  is  currently  suspected  of  being  faulty  because  one  or  more  tests  depending  on 
it  failed,  then  it  should  be  certified  more  strongly  by  some  passed  tests  than  by  others.  In 
particular,  a  passed  test  dependent  on  immediate  effects  (an  approximation  of  failure 
modes;  see  Section  2.3)  which  are  currently  suspected  should  yield  a  larger  reduction  of  c, 
than  other  passed  tests  depending  on  module  i. 

(5)  A  passed  test  that  satisfies  (4)  above,  which  also  was  made  under  the  same  UUT  stimulus 
conditions,  should  yield  an  even  larger  reduction  of  c,. 

In  earlier  versions  of  the  implemented  FIS  system,  we  tried  a  simple  linear  scheme.  For  exam¬ 
ple,  if  a  module  has  10  module-rules  and  four  of  the  rules  have  right-hand  sides  on  the  causal  path  to 
some  test  that  passed,  then  the  certification  factor  for  the  module  is  c,  =  1  -  4/10  =  .6.  Thus  the  a 
priori  fault  probability  of  the  module  would  be  multiplied  by  0.6  before  being  used  in  the  probability 
calculations  of  Section  3.3.  However,  this  violates  the  above  desiderata  except  (1). 

One  way  to  satisfy  the  desiderata  while  still  avoiding  requiring  joint  statistics  data  for  the  failure 
modes  of  a  module  is  by  the  following  scheme.  Keep  track  of  how  many  of  the  tests  that  depend  on 
each  immediate  effect  of  the  given  module  have  been  made  and  passed.  Then  take  the  sum  s  of  these 
and  compute  e  as  the  certification  factor  for  the  module,  with  the  constant  k  chosen  empirically  to 
optimize  performance.  This  addresses  desiderata  (1),  (2),  and  (3).  Now  if  the  module  in  question  is 
contained  in  any  ambiguity  set,  i.e.,  if  it  feeds  any  failed  test,  then  e  “s“  is  taken  to  be  the  certifi¬ 
cation  factor.  Here  sa  is  the  number  of  passed  tests  that  depend  on  those  particular  immediate  effects 
of  the  module  that  are  responsible  for  that  module  being  in  an  ambiguity  set.  That  is,  for  each  passed 
test  we  note  for  each  module  on  which  the  test  depends  how  many  of  the  abnormalities  it  can  immedi¬ 
ately  cause  (at  its  terminals)  lie  on  the  causal  path  to  the  failed  (e.g.  low  or  high)  outcomes  of  the 
passed  test.  The  causal  connections  between  the  various  immediate  effects  and  the  test  outcomes  can 
be  found  by  direct  lookup  of  precompiled  ambiguity  sets  of  tests  (see  Section  2.3).  The  constant  ka  is 
chosen  empirically  to  optimize  performance.  This  satisfies  desideratum  (4).  Finally,  to  cover  (5), 
one  can  extend  the  above  exponential  expression  to  e  A  's‘ ,  where  ss  is  the  number  of  passed 
tests  that  depend  on  those  particular  immediate  effects  of  the  module  that  are  responsible  for  that 
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module  being  in  an  ambiguity  set  and  which  share  the  same  setup  as  a  failed  test  implicating  such  an 
immediate  effect.  Many  other  strategies  are  possible  for  the  approximate  certification  of  modules  in 
the  absence  of  failure  mode  joint  statistics.  The  above  strategies  may  suggest  other  ideas. 

4.  BEST  TEST  STRATEGIES 

We  define  the  optimal  best  test  strategy,  as  the  one  that  minimizes  the  average  total  cost  of  the 
tests  required  to  achieve  some  specified  degree  of  certainty  about  which  modules  in  the  UUT  are 
faulty  and  which  ones  are  not.  Optimality  is  an  unrealistic  goal  in  most  applications,  but  there  are 
various  ways  to  achieve  adequate  suboptimal  performance.  In  the  following  sections  we  cast  the  best 
test  problem  in  terms  of  game  trees  and  then  argue  for  the  direct  evaluation  of  the  available  tests 
without  tree  search.  Two  practical  forms  of  evaluation  are  described;  heuristic  strategies,  which  lend 
themselves  to  automated  explanation,  and  a  more  rigorous  information  theoretic  approach.  For  the 
latter  we  introduce  an  algorithm  for  computing  the  entropy  of  a  set  of  statistically  independent  propo¬ 
sitions  constrained  by  an  arbitrary  Boolean  function. 

4.1  Testing  and  Game  Trees 

The  problem  of  finding  the  best  test  to  make  in  diagnostic  reasoning  is  analogous  to  the  problem 
of  playing  a  game  against  an  opponent  who  responds  randomly.  The  test  result  is  the  opponent’s 
move  and  the  test  selection  is  our  move.  The  object  of  our  side  is  to  minimize  the  total  cost  of  find¬ 
ing  the  fault(s).  We  could  think  of  the  opponent  as  confounding  our  efforts  by  giving  test  results  that 
sometimes  further  our  goal,  but  sometimes  they  don't. 

One  method  that  is  optimal  in  the  sense  of  the  above  paragraph  is  the  miniaverage  algorithm 
(Fig.  5).  The  strategy  is  to  propagate  the  total  test  costs  given  at  the  terminal  nodes  back  to  the  root 
node.  This  is  done  by  recursively  computing  the  backed-up  cost  of  a  node  from  the  costs  of  its 
offspring.  There  are  two  cases.  The  backed-up  cost  of  a  test  node  is  tne  weighted  average  of  the 
backed-up  costs  of  the  offspring  result  nodes,  where  the  weights  are  the  probabilities  of  the  results. 
The  backed-up  cost  of  a  result  node  is  the  minimum  of  the  backed-up  costs  of  its  offspring  test  nodes. 
When  the  propagation  reaches  the  root  node,  a  result  node,  the  minimizing  test  node  is  noted.  This 
node  corresponds  to  the  best  test. 

Although  this  is  an  optimal  solution,  it  is  infeasible  for  large  problems.  The  branching  factor  at 
the  result  nodes  is  then  large,  since  the  number  of  available  tests  is  typically  large.  The  tree  size  is 
roughly  this  branching  factor  raised  to  the  power  d,  the  average  depth  of  the  tree,  and  d  increases 
with  the  size  of  the  problem. 

This  problem  can  be  mitigated  by  using  the  gamma  miniaverage  algorithm  [8,13].  This  method 
performs  the  miniaverage  calculation  more  efficiently  by  computing  the  backed-up  costs  in  an  order 
that  allows  considerable  pruning  of  the  tree.  Also,  Ref.  8  reduces  the  depth  of  the  tree  by  starting  the 
propagation  not  at  the  terminal  nodes  of  the  tree,  but  at  some  fixed  depth,  and  estimating  the  backed- 
up  costs  at  these  nodes  with  evaluation  functions. 

We  regard  a  depth  of  one  to  be  the  most  practical  choice,  since  the  number  of  available  tests  is 
often  several  hundred.  Then  the  miniaverage  method  degenerates  to  simply  iterating  once  over  the 
list  of  available  tests.  For  each,  a  weighted  average  is  taken  of  the  evaluation  function  applied  to  the 
UUT  states  resulting  from  the  several  possible  test  results,  where  the  weights  are  the  estimated  test- 
result  probabilities.  The  minimum  valued  test  is  then  recommended. 
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Current  Situation 


ok  hi  lo  ok  bad 


Portion  of  Miniaverage  Tree 
Fig.  5  —  Miniaverage  methods:  optimal  but  costly 

4.2  Information  Gain  vs  Cost 

It  is  difficult  to  find  an  evaluation  function  that  directly  computes  an  estimate  of  the  backed  up 
cost  of  a  UUT  state  without  performing  the  tree  search.  However,  instead  of  trying  to  pick  a  test 
leading  to  a  minimum  expected  remaining  cost  of  testing,  one  can  pick  a  test  that  maximizes  the  ratio 
of  the  expected  information  gain  for  that  test  to  its  cost  (a  predefined  test  cost).  This  expected  infor¬ 
mation  gain  is  a  weighted  average  of  the  reduction  in  the  Shannon  entropy  of  the  UUT  over  the 
several  possible  test  results,  where  the  weights  are  the  estimated  test-result  probabilities.  The  reason¬ 
ing  here  is  that  since  the  entropy  of  the  UUT  is  a  measure  of  how  much  information  is  required  to 
know  its  failure  state  exactly  (each  module  good  or  bad)  and  the  entropy  monotonically  decreases  at 
each  test,  there  is  no  risk  of  retrogressing  from  our  goal  of  certainty.  Therefore,  it  is  reasonable  to 
seek  maximal  progress  in  one  step. 

Estimates  of  the  probabilities  of  test  results  are  needed  for  use  in  the  best-test  calculations.  The 
probability  of  an  abnormal  (such  as  high  or  low)  result  can  be  rapidly  estimated  by  looking  up  the 
ambiguity  set  for  that  result  and  noting  the  associated  immediate  effects  of  each  module.  Then  an 
estimate  is  made  of  the  fraction  f  of  the  current  probability  P,  that  module  /  is  faulty  that  is  causally 
related  to  these  immediate  effects,  by  using  the  certification  information  for  module  i.  Then  the 
estimated  test  result  probability  is  1  -  fl(l  -  f  Pj).  Simply,  we  estimate  the  amount  of  the  proba- 

I 

bility  mass  of  various  possible  faults  that  causally  lies  upstream  of  the  hypothesized  test  result. 

16 


NRL  REPORT  9219 


4.3  Heuristic  Strategies 

The  information  theoretic  approach  described  in  Section  4.2  has  the  advantage  of  theoretical 
rigor,  but  despite  the  efficient  entropy  algorithm  elaborated  later  in  Section  4.4  it  still  can  be  too  slow 
for  real-time  applications.  This  is  because  one  has  to  process  the  results  of  many  hypothetical  tests  to 
obtain  the  UUT  states  whose  entropy  is  to  be  computed.  Therefore,  we  have  experimented  with  a 
more  heuristic  approach.  This  approach  has  the  advantages  of  speed  and  transparency.  This  is,  it 
allows  the  development  of  explanation  software.  This  is  of  great  practical  importance  for  technician’s 
aide  and  training  aide  applications.  It  has  the  disadvantage  that  it  is  difficult  to  prove  anything  about 
its  performance. 

First  we  discuss  the  case  in  which  no  test  has  yet  failed.  In  this  case,  the  objective  is  to  find  a 
test  that  fails  quickly  (i.e.,  at  low  cost)  if  there  is  any  fault.  If  there  is  not  any  fault,  then  it  doesn’t 
matter  in  what  order  the  tests  are  done;  we  simply  exhaust  a  given  list  of  tests  and  declare  the  UUT 
not  faulty.  A  reasonable  heuristic  is  then  to  select  a  test  that  maximizes  the  estimated  probability  of 
test  failure  divided  by  the  test  cost.  One  simple  and  efficient  way  to  estimate  test  result  probabilities 
is  given  in  Section  4.2. 

Next  we  discuss  the  case  in  which  at  least  one  test  has  failed  so  far.  In  this  case  the  objective  is 
to  isolate  the  fault(s)  at  minimal  cost.  One  of  the  most  useful  heuristics  is  that  tests  dependent  on 
some  but  not  all  of  a  suspect  set  (defined  below)  of  modules  are  powerful.  If  such  a  test  fails,  the 
suspect  set  tends  to  become  smaller,  essentially  by  a  process  of  intersection.  If  it  passes,  it  tends  to 
become  smaller  by  a  process  of  elimination;  the  certification  process  reduces  the  c,  values  of  the 
modules  on  which  the  test  depends.  A  suspect  set  can  be  defined  in  various  ways;  for  example,  as  a 
set  of  modules  whose  current  fault  probabilities  satisfy  some  criterion,  or  better,  as  a  non-null  inter¬ 
section  of  some  or  all  of  the  ambiguity  sets  of  failed  tests.  Further  refining  this  heuristic,  tests  are 
better  which  not  only  depend  on  a  subset  of  the  suspect  set,  but  which  depend  on  suspect  immediate 
effects.  Better  still  are  the  tests  that  share  the  same  stimulus  setup  with  failed  tests  that  depend  on 
suspect  immediate  effects.  The  latter  two  are  directed  toward  achieving  strong  certification  if  the  test 
passes.  Note  the  relevance  of  Section  3.4.  Finally,  tests  that  have  low  cost  are  preferred  to  tests  of 
equal  merit  but  with  higher  cost.  Many  variants  of  these  ideas  are  possible;  empirical  experience 
should  be  used  as  a  guide  in  specific  applications. 

4.4  Fast  Entropy  Algorithm 

The  UUT  entropy  discussed  in  Section  4.2  can  be  expressed  as  follows; 

2" 

H(T)  =  -  £  pi  log  pi,  where  log  a  log2.  (2) 

i  =  i 

Here  p,  is  the  current  probability  of  the  z'th  complete  fault  hypothesis  (all  modules  hypothesized 
good  or  bad),  and  n  is  the  number  of  modules  in  the  UUT.  Equation  (2)  is  not  in  a  suitable  form  for 
efficient  computation  because  its  complexity  increases  exponentially  with  the  number  n  of  modules. 
For  only  20  modules,  220  =  1048576  terms  would  need  to  be  computed.  Therefore,  we  compute  this 
quantity  more  efficiently  by  the  drastic  automatic  simplification  of  Eq.  (2)  after  manipulating  the 
Boolean  function  T  representing  the  failed  test  results  to  date  by  using  the  algorithm  in  Section  3.3. 
This  procedure  introduces  no  approximations  but  greatly  improves  running  time.  It  is  also  generally 
applicable  to  the  problem  of  efficiently  computing  the  entropy  of  a  set  of  statistically  independent  pro¬ 
positions  constrained  by  an  arbitrary  Boolean  function. 
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We  assume  that  the  description  of  the  state  probabilities  p,  is  that  provided  by  the  Boolean  algo¬ 
rithm  of  Section  3.3.  That  is.  the  Boolean  function  B  describing  the  feasible  states  is  represented  in  a 
form  such  that  a  negation  appears  at  the  top  level,  with  a  disjunction  below  that  and  a  conjunction 
below  that.  Each  conjunct  of  such  a  conjunction  is  either  a  literal  (complemented  or  uncomple¬ 
mented)  or  an  expression  of  the  form  described  in  the  preceding  sentence,  recursively.  Finally,  all 
disjunctions  are  of  mutually  exclusive  terms,  and  all  conjunctions  are  of  statistically  independent  terms 
(no  common  literals). 

The  strategy  is  to  decompose  H (B)  from  the  top  down  by  using  several  mathematical  identities 
about  entropy  expressions,  until  it  is  explicitly  computable  in  terms  of  arithmetic  and  transcendental 
functions  of  the  a  priori  probabilities  a,  of  the  Boolean  literals.  Thus,  our  efficient  entropy  algorithm 
is  in  the  same  spirit  as  the  efficient  probability  algorithm  of  Section  3.3,  although  the  details  are  dif¬ 
ferent.  Gallager  [14]  provides  a  good  background  for  the  remainder  of  Section  4.4. 

Definitions: 


A,  =  n  f(ak), 

k 

where 

f(ak)  3=  ak,  if  the  kth  bit  of  the  binary  representation  of  /  =  0 


s a*,  if  the  kth  bit  of  the  binary  representaiton  of  i  =  1, 


where  ak  denotes  1  -  ak.  Thus  the  A{  are  the  2n  state  probabilities  with  no  Boolean  constraint. 

/4Sl  =  n  fi(ak),  as  above,  except  only  a  subspace  of  size  2/  is  covered,  where  /  is  the  size  of  the 
ktS 

given  subset  S  of  literals. 

H(B)  a=  -  £  cA,  log  cAj,  where  1/c  =  £  A,.  This  is  the  Shannon  entropy  of  the  probabilities 
fl  =  l  B  =  1 

Aj  normalized  to  unity  over  those  states  i  for  which  B  -  1 . 

Hn(B)  =  —  £  A,  log/4,.  This  nonnormalized  entropy  is  not  a  true  entropy,  since  the  A,  are  not 

B  =  i 

normalized  to  unity  over  the  states  i  for  which  B  =  1 . 

H${B)  as  -  £  cASj  log  cASl .  This  is  the  entropy  over  the  subspace  s  spanned  by  those  variables 

B  =  l 

occurring  in  B.  S  in  /ls,  is  the  set  of  those  variables. 

2‘ 

Ms  —  ~~  £  ^si  1°8  ^Si>  where  S  is  a  given  subset  of  /  Boolean  literals  of  the  n  Boolean  literals 

i  =  1 

occurring  in  the  complete  problem.  Note  that  Hs  can  be  computed  in  0(1)  steps  by  using  the  identity 
unconstrained  independent  literals  below. 
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Entropy  indentities  needed: 
negation  :  Hn(B)  =  Hn(  1)  -  Hn(B). 

normalization :  H(B)  =  cHn(B)  -  log  c ,  where  l  /c  =  £  A,  =  P(B). 

8  =  1 

independent  conjunction :  HS(B&C)  =  HS(B)  +  HS(C),  if  B  and  C  contain  no  common  literals,  and 
are  thus  statistically  independent. 

nonoverlapping  disjunction  :  Hn(BvC)  =  Hn(B)  +  Hn(C),  if  B  and  C  are  mutually  exclusive. 

unconstrained  independent  literals :  Hs  =  —  £  (a,  log  a,  +  a,  log  a,). 

itS 


Algorithm: 

The  following  is  a  statement  of  the  efficient  entropy  algorithm: 

(1)  Express  H(B)  in  terms  of  Hn(B)  using  normalization. 

(2)  Express  Hn{B)  in  terms  of  Hn(B)  by  using  negation. 

(3)  Express  Hn(B)  as  a  sum  of  terms  of  the  form  where  D  denotes  each  disjunct  of  B. 

(4)  Express  each  Hn(D)  in  terms  of  H(D)  by  using  normalization. 

(5)  Express  each  H(D)  as  HS(D)  +  //s,  where  S  is  the  set  of  Boolean  literals  not  occurring  in 
D  but  occurring  the  the  complete  problem. 

(6)  Express  each  HS(D)  as  a  sum  of  the  H${C)  over  the  various  conjuncts  C  of  D ,  using 
independent  conjunction. 

(7)  For  each  literal  C,  set  HS(C)  =  0. 

(8)  For  each  nonliteral  C,  apply  the  whole  algorithm  recursively  to  HS(C),  since  it  has  the 
form  of  B. 

(9)  Expand  any  Hs  terms  produced  by  Step  (5),  using  unconstrained  independent  literals. 

Example: 

Suppose  we  wish  to  compute  the  entropy  H(\& 2&3  v  3)  of  the  set  of  16  states  of  the  four  pro¬ 
positions  X),  x-i,  X3,  X4  with  a  priori  probabilities  a  j,  a2,  a 3,  a4,  respectively,  constrained  by  the 

Boolean  relation  x  j  <£Jc  2  3  v  3c  3  =  1  (Note  the  shorthand;  i  for  xh  and  a  =  1  —  a  for  numerical 

expressions  a  only). 
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Step  1: 

H{[&2&  3  v  3)  =  cHn<J& M3  v  3)  -  log  c, 


where  1/c  =  P(\&2&  3  v  3)  =  1  -  a  i  a2a3  ~  a3  ■ 

Step  2: 

Hn(l &2&3  v3  =  Hn(  1)  -  Hn(\<&2&3  v  3). 

Step  3: 

Hn(\&2&3  v  3)  =  Hn(i& 2&3)  +  //„( 3). 

Step  4: 

Hn(l&2&  3)  =  ^  ^  +  c  >  where  1/c'  =  P(l<62<63)  =  a]a2a3,  and 


Hn(  3)  = 


//( 3)  +  log 


where  1/c"  =  P(3)  =  <z3 . 


Step  5: 

H(  1&2&3)  =  Hs(l&2&3)  +  Hw  and  H(3)  =  Hs( 3)  +  ff|lt2>4). 

Step  6: 

HC\&2&3)  =  Hs(  T)  +  Hs{  2)  +  //5(3)  +  ff|4|. 


Step  7  and  9: 

//(1<42<£3)  =  H |4|  =  -  [a4  log  a4  +  a4  log  a4],  and 
7/(3)  =  W|i,2,4|  =  -  [a  t  log  a ,  +  a  i  log  a  \  +  a2  log  a2  +  a2  log  a2  +  a4  log  a4  +  «4  log  <**]■ 


Now,  given  the  a,  values,  one  can  substitute  backwards  through  the  above  equations  to  obtain 
H(\&2&3  v  3).  For  example,  if  a\  =  a2  =  a3  =  a4  =  .5,  we  obtain  h(\&2&3  v  3)  = 
1  +  log  3.  We  can  readily  verify  this  special  case  by  noting  that  this  problem  has  six  states  of  equal 
nonzero  probability  1/6.  Thus  its  entropy  is  6(1/6  log  (6))  =  log  (6)  =14-  log  3. 

The  complexity  of  the  entropy  algorithm  is  closely  related  to  the  complexity  of  the  probability 
algorithm  of  Section  3.3,  since  both  use  the  same  Boolean  manipulation  algorithm  as  their  first  stage. 
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Whereas  the  probability  algorithm  introduces  no  significant  complexity  beyond  that  of  the  Boolean 
algorithm,  the  entropy  algorithm  does  so  in  Step  9.  Every  disjunct  encountered  in  the  output  of  the 
Boolean  algorithm  invokes  an  application  of  unconstrained  independent  literals,  which  requires  only 
21  operations,  where  /  is  the  number  of  literals  occurring  in  the  problem  but  not  in  the  disjunct. 
Therefore,  if  there  are  d  disjuncts,  0(dlmix)  computation  is  introduced  beyond  the  Boolean  computa¬ 
tion,  where  lrmx  is  the  maximum  l  over  all  the  disjuncts.  Since  /max  is  bounded  by  the  total  number 
n  of  literals,  this  complexity  becomes  0(dn).  But  d  =  0(aa),  the  number  of  terms  generated  in  Step 
4  of  the  Boolean  algorithm  in  Section  3.3.  Therefore,  the  total  complexity  added  by  the  numerical 
part  of  the  entropy  algorithm  is  0(aan),  which  is  the  same  as  that  of  the  Boolean  algorithm.  Thus  a 
total  worst-case  complexity  of  the  entire  entropy  algorithm  is  0(aan),  although,  as  mentioned  in  Sec¬ 
tion  3.3,  the  typical  case  is  far  better  than  this,  often  approximately  a2n. 

5.  FIS:  AN  IMPLEMENTED  DIAGNOSTIC  SYSTEM 

As  mentioned  earlier,  these  ideas  have  been  incorporated  into  a  working  diagnostic  system  that 
we  have  named  FIS.  This  section  gives  a  brief  overview  of  FIS  from  a  user’s  point  of  view. 

The  major  components  of  FIS,  as  well  as  its  two  principal  users,  are  illustrated  in  Fig.  6.  We 
describe  the  components  chronologically  as  they  are  used.  First,  the  knowledge  engineer,  whose 
principal  expertise  is  a  detailed  understanding  of  the  type  of  equipment  to  be  diagnosed,  describes  the 
UUT  to  the  computer.  This  is  done  from  a  computer  terminal  via  the  knowledge  acquisition  interface 
(KAI),  producing  the  UUT  description,  which  is  then  stored  in  a  file.  A  principal  concern  in  the 
design  of  the  KAI  is  maximizing  the  ease  with  which  a  user  with  a  good  electronics  background  but 
little  familiarity  with  computer  science  can  generate  a  description  of  the  UUT  that  will  yield  accept¬ 
able  diagnostic  performance  in  FIS.  This  description  contains  information  such  as  a  priori  rates  of 
failure  of  the  replaceable  modules,  costs  (primarily  in  time)  of  setup  and  test  operations,  connectivity 
and  qualitative  functional  descriptions,  a  set  of  allowed  tests,  and  a  graphics  description  for  displaying 
a  block  diagram  of  the  UUT. 

The  Electronics  Library  is  primarily  intended  to  store  partial  or  complete  qualitative  functional 
descriptions  of  modules  that  occur  frequently  in  the  UUTs  being  described.  The  Electronics  Library 
is  to  be  referred  to  whenever  possible  while  using  the  KAI  to  enhance  speed.  The  resulting  approxi¬ 
mate  module  descriptions  can  then  be  edited  by  using  the  KAI.  Also,  the  KAI  can  be  used  to  add, 
delete  or  modify  items  in  the  library.  After  its  completion,  the  UUT  description  is  processed  by  the 
knowledge  compiler.  This  produces  a  file  containing  the  compiled  UUT  knowledge. 

The  knowledge  compiler  transforms  knowledge  from  a  form  suitable  for  editing  to  a  form  suit¬ 
able  for  efficient  diagnostic  computation.  The  knowledge  compiler  is  run  after  the  UUT  description 
is  completely  finished  and  stores  the  compiled  UUT  description  in  a  file. 

Once  we  have  compiled  UUT  descriptions  available,  the  diagnostic  reasoning  subsystem  can  be 
invoked  to  perform  a  variety  of  diagnostic  tasks.  The  diagnostic  reasoning  subsystem  uses  a  compiled 
UUT  description  to  dynamically  construct  and  maintain  a  belief  model  about  what  is  properly  and 
improperly  functioning  as  test  results  become  known.  This  belief  model  in  turn  is  used  to  find  the 
best  test  to  make  next  or  to  recommend  the  replacement  of  some  module  of  the  UUT. 
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We  currently  provide  two  user  interfaces  to  the  diagnostic  subsystem.  The  first  is  a  “mixed  ini¬ 
tiative”  interface  intended  to  be  used  by  a  technician  as  an  aid  during  a  diagnostic  troubleshooting 
session.  In  this  mode,  FIS  can 

•  Update  the  current  beliefs  about  the  UUT  based  on  a  technician’s  entry  of  test  results; 

•  Respond  to  a  technician’s  query  regarding  the  probability  of  a  fault  hypothesis,  the  merit  of  a 
test,  the  UUT  description  or  the  belief  state  of  FIS; 

•  Make  suggestions  and  recommendations  about  the  next  best  action  to  take  (further  tests  or 
replacements). 

The  second  rrode  uses  the  diagnostic  subsystem  to  produce  a  traditional  test  tree  by  invoking  the 
next  best  action  generator,  following  us  advice  (hypothetically),  and  recursively  invoking  the  next  best 
action  generator  for  each  of  the  possible  outcomes  of  the  previously  recommended  action.  In  this 
way,  large  complex  test  trees  are  generated  automatically  with  no  human  intervention. 

We  have  discussed,  but  have  not  currently  implemented,  several  other  possible  user  interfaces 
including  a  tutorial  interface  for  use  in  teaching  troubleshooting,  and  a  testability  interface  for  evaluat¬ 
ing  proposed  systems  early  in  the  design  phase. 
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6.  CURRENT  APPLICATIONS  OF  FIS 

We  have  tested  and  refined  FIS  on  real  analog  UUTs  to  assess  its  performance  and  practicality 
and  to  provide  ideas  for  improvements.  We  have  modelled  a  radar  receiver/exciter  subsystem  to  gen¬ 
erate  test  trees  (called  diagnostic  and  functional  flowcharts)  that  are  comparable  in  quality  to  those 
written  by  test  programmers.  As  a  second  application,  we  have  applied  FIS  to  a  Navy  sonar  system 
as  a  technician’s  aide. 

In  the  radar  application,  the  goal  is  to  demonstrate  that  FIS  can  relieve  the  human  test  program¬ 
mer  of  part  of  the  labor  in  producing  a  test  program  se*  (TPS),  which  is  a  program  that  controls  ATE 
gear  in  the  automatic  execution  of  a  test  tree.  The  automatic  test  tree  generation  capability  of  FIS  is 
used  in  this  application.  The  part  of  the  human  effort  that  is  relieved  is  the  sequencing  of  the  possible 
tests  and  the  determination  of  when  some  module(s)  warrant  replacement  and  which  ones.  In 
exchange,  FIS  places  a  modest  knowledge  acquisition  burden  on  the  user,  primarily  in  the  form  of 
causal  rules  and  test  cost  information.  Other  parts  of  this  task  that  we  leave  under  human  control  are 
the  choice  of  test  equipment  and  the  determination  of  what  constitutes  a  sufficient  set  of  tests  to  cer¬ 
tify  that  the  UUT  is  performing  correctly,  and  to  do  fault  isolation.  Figure  7  shows  a  hardcopy  ver¬ 
sion  of  the  more  detailed  CRT  color  graphics  display  of  the  radar  unit.  This  is  useful  during  the 
development  of  the  UUT  description,  when  the  technician’s  aide  mode  is  invoked  to  interactively  test 
the  diagnostic  performance  before  test  trees  are  generated. 


Fig  7  —  An  approximation  of  the  color  F IS  (JUT  display 
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This  application  has  suggested  some  refinements  in  FIS  to  make  it  practical  for  TPS  applica¬ 
tions.  By  improving  the  speed  and  accuracy  of  the  best  test  computation  and  implementing  a  more 
intelligent  handling  of  passed  tests,  FIS  now  generates  fault  isolation  trees  that  are  as  good  or  better 
than  the  existing  ones  generated  manually. 

The  second  application  involves  the  use  of  FIS  as  a  maintenance  advisor  to  a  technician.  It  is 
installed  in  a  nigged  portable  computer  and  directs  or  assists  a  technician  in  troubleshooting  a  com¬ 
plex,  primarily  analog  sonar  subsystem  consisting  of  105  replaceable  modules,  using  manual  test 
equipment.  The  primary  functions  are  the  recommendation  of  a  next  best  test  and  the  reporting  of 
FIS  beliefs  after  a  test  is  made. 

In  addition  to  these  activities  in  which  we  are  directly  involved,  we  have  also  initiated  a  technol¬ 
ogy  transfer  program  in  which  we  provide  current  versions  of  FIS  to  approximately  25  sites  in  other 
government  laboratories  and  private  industry.  Our  goal  is  to  make  people  aware  of  the  potential  of 
Al-based  diagnostic  systems  like  FIS  and  to  receive  feedback  concerning  their  perceptions  of  the  use¬ 
fulness  of  FIS. 

7.  CONCLUSIONS 

The  goal  of  this  research  effort  has  been  to  exploit  ideas  from  the  area  of  AI  to  build  effective 
diagnostic  systems.  We  have  achieved  this  goal  by  developing  a  knowledge  representation  technique 
called  qualitative  causal  modeling,  which  has  the  property  that  sufficient  behavioral  knowledge  of  a 
UUT  can  be  captured  without  high  knowledge  acquisition  costs  to  allow  generation  of  efficient  diag¬ 
nostic  sequences.  This  representation  technique  is  complemented  by  a  set  of  efficient  algorithms  for 
computing  fault  probabilities,  recommending  tests,  and  making  module  replacements. 

This  approach  provides  two  features  that  are  not  part  of  most  current  diagnostic  systems:  (a)  no 
single  fault  assumptions,  and  (b)  the  ability  to  dynamically  decide  (during  fault  isolation)  what  the 
next  best  test  should  be. 

In  addition,  this  research  has  introduced  efficient  new  algorithms  for  two  general  problems;  (a) 
computing  the  probability  that  a  given  conjunctive  normal  form  Boolean  expression  is  true,  given  sta¬ 
tistically  independent  literals,  and  (b)  computing  the  Shannon  entropy  of  such  a  set  of  literals,  given 
that  such  a  Boolean  function  is  true.  These  were  motivated  by  the  goal  of  avoiding  computational 
bottlenecks  everywhere  in  the  FIS  system  when  scaling  up  to  large  systems.  This  has  been  achieved. 

Our  current  efforts  are  primarily  focused  on  making  minor  improvements  to  FIS  based  on  user 
feedback.  However,  we  have  targeted  FIS  as  an  opportunity  to  exploit  some  of  the  recent  advances 
in  machine  learning.  It  is  clear  that  any  UUT  model  needs  to  be  continually  refined  as  failure  rates 
change,  systems  age,  etc.  Our  goal  is  to  use  machine-learning  techniques  to  automate  this  refinement 
without  the  need  for  significant  human  intervention. 
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